The Knowledge-Gradient Policy for Correlated Normal Beliefs

نویسندگان

  • Peter I. Frazier
  • Warren B. Powell
  • Savas Dayanik
چکیده

We consider a Bayesian ranking and selection problem with independent normal rewards and a correlated multivariate normal belief on the mean values of these rewards. Because this formulation of the ranking and selection problem models dependence between alternatives’ mean values, algorithms may utilize this dependence to perform efficiently even when the number of alternatives is very large. We propose a fully sequential sampling policy called the knowledge-gradient policy, which is provably optimal in some special cases and has bounded suboptimality in all others. We then demonstrate how this policy may be applied to efficiently maximize a continuous function on a continuous domain while constrained to a fixed number of noisy measurements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal learning for sequential sampling with non-parametric beliefs

We propose a sequential learning policy for ranking and selection problems, where we use a non-parametric procedure for estimating the value of a policy. Our estimation approach aggregates over a set of kernel functions in order to achieve a more consistent estimator. Each element in the kernel estimation set uses a di erent bandwidth to achieve better aggregation. The nal estimate uses a weigh...

متن کامل

The knowledge gradient algorithm for online learning

We derive a one-period look-ahead policy for finiteand infinite-horizon online optimal learning problems with Gaussian rewards. The resulting decision rule easily extends to a variety of settings, including the case where our prior beliefs about the rewards are correlated. Experiments show that the KG policy performs competitively against other learning policies in diverse situations. In the ca...

متن کامل

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

We derive a one-period look-ahead policy for finiteand infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multi-armed bandit methods. Experiments show that our KG policy performs competitively against the best known approximation to the opti...

متن کامل

Online Supplement to “The Knowledge-Gradient Policy for Correlated Normal Beliefs”

As discussed in Section 3 of the main paper, the KG policy posseses several optimality and convergence properties. First, it is optimal by construction when N = 1 (Remark 1). Second, the suboptimality gap between the values of the KG and the optimal policies narrows to 0 as N →∞ (Theorem 4). This is a convergence result, since it shows that when sampling under the KG policy we are guaranteed to...

متن کامل

عوامل مؤثر بر تولید اسناد سیاستی مبتنی بر شواهد در ستاد وزارت بهداشت، درمان و آموزش پزشکی

Introduction: Successful reduction in the gap between applied knowledge and pure knowledge, depends on the identification of factors affecting it .The objective of the study was to identify the barriers and facilitators to the development of evidence-based papers from the perspective of their producers at the Ministry of Health Care and Medical Education headquarter office. Methods: Qualitativ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • INFORMS Journal on Computing

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2009